Methods of Measuring Secular Trend
Methods of Measuring Trend: Overview and Comparison
Identifying and isolating the **secular trend** ($T_t$) is a crucial step in time series analysis. The trend represents the smooth, long-term movement of the series, abstracting from the shorter-term seasonal ($S_t$), cyclical ($C_t$), and irregular ($I_t$) variations. Various statistical and graphical methods have been developed to estimate this underlying trend component. The selection of a particular method depends on factors such as the visual pattern of the data, the assumed nature of the trend (e.g., linear or non-linear), the ease of computation, and the purpose of the analysis (e.g., descriptive analysis vs. forecasting).
The primary methods commonly used for measuring and estimating the trend component in a time series are:
- **Freehand Curve Method (Graphical Method):** A subjective method involving drawing a smooth curve visually through the plotted data.
- **Method of Semi-Averages:** A simple algebraic method that fits a straight line by averaging data in two halves of the series.
- **Moving Average Method:** A smoothing technique that replaces each data point with the average of itself and a fixed number of surrounding points.
- **Method of Least Squares:** A statistical method that fits a mathematical curve (like a straight line, parabola, etc.) to the data by minimizing the sum of squared deviations.
Each method has its own procedure, advantages, and limitations. Understanding these helps in choosing the most appropriate technique for a given time series dataset.
Comparison Overview
Here is a summary comparing the key characteristics of these four main methods for measuring trend:
Method | Nature / Approach | Key Advantages | Key Disadvantages |
---|---|---|---|
Freehand Curve | Subjective, Visual, Graphical. Drawing a smooth curve through data points. |
|
|
Method of Semi-Averages | Objective, Algebraic. Fits a straight line by averaging data in two halves. |
|
|
Moving Average Method | Objective, Smoothing Technique. Averages data points over a fixed window. |
|
|
Method of Least Squares | Objective, Statistical Curve Fitting. Fits a mathematical curve by minimizing squared errors. |
|
|
For most analytical and forecasting purposes, the Method of Least Squares is preferred due to its objectivity, mathematical basis, and ability to provide an explicit trend equation. However, moving averages are widely used in decomposition for their smoothing properties, while the other two methods are simpler but have significant limitations.
Freehand Curve Method (Graphical Method)
Concept
The Freehand Curve Method is the simplest and most elementary technique for estimating the trend component of a time series. It is a purely graphical method and relies on the analyst's visual judgment to draw a smooth curve or line that represents the underlying long-term movement of the data, while intentionally ignoring the short-term fluctuations (seasonal, cyclical, and irregular).
The underlying idea is that if you plot the data and visually average out the ups and downs, the resulting smooth line should approximate the general direction or path the series is following over the long run.
Procedure
The steps to apply the freehand curve method are straightforward:
- **Plot the Time Series Data:** Create a line graph of the time series with time ($t$) on the horizontal axis and the observed values ($Y_t$) on the vertical axis.
- **Visual Inspection:** Carefully examine the pattern of the plotted points. Look for the overall upward or downward slope and any apparent curvature.
- **Draw the Curve:** Using a pen or a drawing tool on the graph, draw a smooth curve (it could be a straight line if the trend appears linear) that passes through the plotted data points. The curve should be drawn such that:
- It follows the general direction of the data.
- It smoothes out the peaks and troughs of the short-term fluctuations.
- Ideally, approximately equal numbers of data points should lie above and below the drawn curve throughout its length.
- The sum of the vertical distances of the original data points from the drawn trend curve should be close to zero (positive deviations cancelling out negative ones).
- **Interpret the Curve:** The drawn curve represents the estimated trend line for the time series. Its slope and curvature indicate the nature of the long-term change.
*(Image shows a time series plot with visible fluctuations, and a smoother curve drawn through the center of the fluctuations, indicating the general upward trend.)*
Advantages
- **Simplicity and Speed:** It is the easiest and quickest method to apply, requiring no mathematical calculations.
- **Intuitiveness:** The concept of visually smoothing the data is easy to grasp.
- **Flexibility:** It can be used to approximate trends of any shape (linear, non-linear, or even those with structural breaks) just by adjusting the curve drawn, without needing to specify a mathematical model beforehand.
Disadvantages
- **High Subjectivity:** This is the most significant drawback. The shape and position of the trend curve are entirely dependent on the individual analyst's judgment and drawing. Different people will draw different curves for the same data, leading to inconsistent results. This lack of objectivity makes it unsuitable for rigorous statistical analysis.
- **Non-Mathematical:** The method does not produce a mathematical equation describing the trend. This limits its analytical power and makes it impossible to quantify the rate of change precisely.
- **Unreliable for Forecasting:** Without a mathematical model, extrapolating the freehand curve to predict future values is highly unreliable and purely speculative.
- **Difficult for Comparison:** Comparing the trend of different series or the trend estimated by different analysts is problematic due to the lack of a uniform, objective measure.
In summary, while the freehand curve method provides a quick visual impression of the general trend, its inherent subjectivity makes it unsuitable for formal time series analysis, estimation, or forecasting. It is primarily a preliminary tool for initial exploration.
Summary for Competitive Exams - Methods of Measuring Trend (Overview & Freehand)
Measuring Trend ($T_t$): Estimating the long-term direction of a time series.
Main Methods:
- Freehand Curve (Graphical)
- Method of Semi-Averages (Simple Algebraic)
- Moving Average (Smoothing)
- Method of Least Squares (Statistical Curve Fitting)
Freehand Curve Method:
- Approach: Plot data, draw a smooth curve visually representing the average path.
- Advantages: Simple, quick, flexible (captures non-linear).
- Disadvantages: Highly subjective, not mathematical, unreliable for forecasting.
This method is only for preliminary visual assessment, not for formal analysis.
Method of Semi-Averages
Concept
The Method of Semi-Averages is a straightforward and objective technique for fitting a **straight line trend** to a time series. Unlike the freehand curve method, it provides a unique result for a given dataset. The underlying concept is to divide the time series into two equal (or approximately equal) halves and then find a single representative point for each half by calculating the arithmetic mean of the data values in that half. A straight line is then drawn through these two representative points.
This method simplifies the data by reducing each half of the series to a single average value plotted against the midpoint of the time periods it covers. The line connecting these two points is considered the estimated linear trend.
Procedure
The steps for applying the Method of Semi-Averages are as follows:
- **Divide the Data:** Split the entire time series data into two equal halves based on the number of time periods ($n$).
- If the number of time periods ($n$) is **even**, divide the data exactly into two halves, with $n/2$ periods in each half.
- If the number of time periods ($n$) is **odd**, the data cannot be divided exactly into two equal halves. In this case, the value for the **middle time period** is usually omitted, and the remaining $n-1$ data points are divided into two equal halves of $(n-1)/2$ periods each. The middle period is excluded to ensure that the two halves have the same number of observations.
- **Calculate Semi-Averages:** Compute the arithmetic mean (average) of the observed values ($Y_t$) for each of the two halves. Let these averages be denoted as $\bar{Y}_1$ (for the first half) and $\bar{Y}_2$ (for the second half).
- **Determine Time Points for Plotting:** Identify the time point that corresponds to the **center (middle)** of the periods included in each half.
- For the first half, the center time point ($t_1$) is the mid-point of the time periods in that half.
- For the second half, the center time point ($t_2$) is the mid-point of the time periods in that half. For example, if a half covers years 2018, 2019, 2020, the center is 2019. If a half covers 2021, 2022, 2023, 2024, the center is $(2022+2023)/2 = 2022.5$.
- **Plot the Semi-Average Points:** Plot the two points on the time series graph: $(t_1, \bar{Y}_1)$ and $(t_2, \bar{Y}_2)$.
- **Draw the Trend Line:** Draw a straight line passing directly through these two plotted points. This straight line represents the estimated linear trend using the method of semi-averages.
- **Find the Equation of the Trend Line (Optional but Recommended):** If needed, the equation of the straight line can be determined. A straight line has the form $T_t = a + bt$, where $T_t$ is the trend value at time $t$, $a$ is the intercept, and $b$ is the slope.
- The slope ($b$) is the change in $Y$ divided by the change in time between the two points: $$b = \frac{\bar{Y}_2 - \bar{Y}_1}{t_2 - t_1}$$
- The intercept ($a$) can be found by substituting one of the points $(t_1, \bar{Y}_1)$ or $(t_2, \bar{Y}_2)$ into the equation $T_t = a + bt$ and solving for $a$. For example, using $(t_1, \bar{Y}_1)$: $\bar{Y}_1 = a + b t_1 \implies a = \bar{Y}_1 - b t_1$. The time variable $t$ can be the actual year or a coded time variable (e.g., $t=1$ for the first year, $t=2$ for the second, etc.). Using coded time often simplifies calculations.
Example 1. Find the trend line using the method of semi-averages for the following data and write its equation:
Year | Value ($Y_t$) |
---|---|
2018 | 100 |
2019 | 110 |
2020 | 105 |
2021 | 120 |
2022 | 115 |
2023 | 130 |
Answer:
Given: Time series data for 6 years.
To Find: Trend line using Semi-Averages method and its equation.
Solution:
1. Divide Data: The number of years is $n=6$, which is even. Divide the data into two equal halves of $6/2 = 3$ years each.
- First Half: Years 2018, 2019, 2020 (Values: 100, 110, 105)
- Second Half: Years 2021, 2022, 2023 (Values: 120, 115, 130)
2. Calculate Semi-Averages:
- Average of the first half: $\bar{Y}_1 = \frac{100 + 110 + 105}{3} = \frac{315}{3} = 105$.
- Average of the second half: $\bar{Y}_2 = \frac{120 + 115 + 130}{3} = \frac{365}{3} \approx 121.67$.
3. Determine Time Points for Plotting:
- Center of the first half (2018, 2019, 2020) is the middle year: $t_1 = 2019$.
- Center of the second half (2021, 2022, 2023) is the middle year: $t_2 = 2022$.
We have two points for the trend line: (2019, 105) and (2022, 121.67).
4. Find the Equation of the Trend Line:
Let the trend equation be $T_t = a + bt$, where $t$ represents the year.
The slope ($b$) is the change in value divided by the change in time:
$$b = \frac{\bar{Y}_2 - \bar{Y}_1}{t_2 - t_1} = \frac{121.67 - 105}{2022 - 2019}$$
[Slope formula]
$$b = \frac{16.67}{3} \approx 5.556$$
Now, find the intercept ($a$) using one of the points, say $(t_1, \bar{Y}_1) = (2019, 105)$, and the equation $T_t = a + bt$.
$$105 = a + 5.556 \times 2019$$
(Substituting $t=2019, T_t=105, b=5.556$)
$$105 = a + 11227.524$$
$$a = 105 - 11227.524 = -11122.524$$
The equation of the trend line is: $$T_t = -11122.524 + 5.556 t$$ (where $t$ is the year).
Alternatively, we can use a coded time variable starting from 0 or 1 to simplify the intercept calculation. Let's use $t=1$ for 2018, $t=2$ for 2019, ..., $t=6$ for 2023.
The time points for plotting become the midpoints of the coded time periods:
- First half (t=1, 2, 3): Center is $(1+3)/2 = 2$. So, point is (2, 105).
- Second half (t=4, 5, 6): Center is $(4+6)/2 = 5$. So, point is (5, 121.67).
Slope ($b$):
$$b = \frac{121.67 - 105}{5 - 2} = \frac{16.67}{3} \approx 5.556$$
[Slope using coded time]
Find intercept ($a$) using point (2, 105):
$$105 = a + 5.556 \times 2$$
(Substituting coded $t=2, T_t=105, b=5.556$)
$$105 = a + 11.112$$
$$a = 105 - 11.112 = 93.888$$
The equation of the trend line is: $$T_t = 93.888 + 5.556 t$$ (where $t$ is the coded year, $t=1$ for 2018).
5. Plotting: Plot the original data and the line passing through (2019, 105) and (2022, 121.67) (or using coded time, through (2, 105) and (5, 121.67)).
*(Image shows the original 6 data points and a straight line drawn through the approximate positions of (2019, 105) and (2022, 121.67).)*
Example 2. Find the trend line using the method of semi-averages for the following data (Odd number of periods):
Year | Value ($Y_t$) |
---|---|
2017 | 50 |
2018 | 55 |
2019 | 65 |
2020 | 60 |
2021 | 70 |
2022 | 75 |
2023 | 80 |
Answer:
Given: Time series data for 7 years.
To Find: Trend line using Semi-Averages method.
Solution:
1. Divide Data: The number of years is $n=7$, which is odd. Omit the middle year (2020, the 4th year). The remaining $7-1=6$ years are divided into two equal halves of $6/2 = 3$ years each.
- First Half: Years 2017, 2018, 2019 (Values: 50, 55, 65)
- Second Half: Years 2021, 2022, 2023 (Values: 70, 75, 80)
2. Calculate Semi-Averages:
- Average of the first half: $\bar{Y}_1 = \frac{50 + 55 + 65}{3} = \frac{170}{3} \approx 56.67$.
- Average of the second half: $\bar{Y}_2 = \frac{70 + 75 + 80}{3} = \frac{225}{3} = 75$.
3. Determine Time Points for Plotting:
- Center of the first half (2017, 2018, 2019) is the middle year: $t_1 = 2018$.
- Center of the second half (2021, 2022, 2023) is the middle year: $t_2 = 2022$.
We have two points for the trend line: (2018, 56.67) and (2022, 75).
4. Plot the Trend Line: Plot these two points and draw a straight line passing through them. This line represents the estimated linear trend for the series using the method of semi-averages.
*(Image shows the original 7 data points and a straight line drawn through the approximate positions of (2018, 56.67) and (2022, 75). The point for 2020 might be noticeably off the line).*
Advantages
- **Objective:** Unlike the freehand method, the Method of Semi-Averages is objective. Given the data and the rule for handling odd periods, any analyst will arrive at the exact same two points and thus the same trend line.
- **Simple to Calculate:** The calculations involved (division into halves, simple arithmetic means) are very basic and easy to perform.
- **Provides a Linear Equation:** It results in a specific linear equation ($T_t = a + bt$) which can be used for simple trend projection, though its reliability is limited.
Disadvantages
- **Assumes Linearity:** This method can *only* fit a straight line. It cannot capture or represent non-linear trends (e.g., exponential growth, quadratic patterns) present in the data. If the actual trend is non-linear, the semi-averages line will be a poor fit.
- **Sensitive to Extreme Values:** Since it uses simple arithmetic means for each half, the trend line can be heavily influenced by any extreme values or outliers present in the first or second half of the data.
- **Limited Information Used:** The entire trend line is determined by just two points derived from summary averages. This can lead to a trend line that does not accurately represent the overall pattern of the data points between these two midpoints.
- **Loss of Data (for odd periods):** If the number of periods is odd, the value of the middle period is discarded from the calculation entirely.
Due to its assumption of linearity and sensitivity to extreme values, the Method of Semi-Averages is a relatively crude method and is not recommended for complex time series or when a high degree of accuracy is required. It is more useful as an illustrative method or for very simple trend analysis.
Summary for Competitive Exams - Method of Semi-Averages
Method of Semi-Averages: Simple, objective method to fit a linear trend line.
Procedure:
- Divide data into two equal (or near equal, omitting middle if odd) halves.
- Calculate mean ($ \bar{Y} $) for each half.
- Plot means at the time midpoint of their respective halves ($t_1, \bar{Y}_1$) and ($t_2, \bar{Y}_2$).
- Draw a straight line through these two points.
- Equation: $T_t = a + bt$, where $b = (\bar{Y}_2 - \bar{Y}_1) / (t_2 - t_1)$.
Advantages: Objective, simple calculation, provides a linear equation.
Disadvantages: **Assumes linear trend only**, sensitive to outliers, uses limited information (only two summary points), ignores middle point if $n$ is odd.
Suitable only for simple cases or illustration.
Moving Average Method (Calculation and Smoothing)
Concept
The **Moving Average Method** is a widely used technique in time series analysis primarily for **smoothing** the data. Its main purpose is to eliminate or significantly reduce the effect of short-term fluctuations, such as seasonal variations ($S_t$) and irregular variations ($I_t$), to reveal the underlying longer-term pattern, which is typically the combination of the trend ($T_t$) and the cyclical component ($C_t$).
The method works by calculating a sequence of arithmetic means (averages) of the data values over a fixed-size "window" of consecutive periods. This window moves forward one period at a time, generating a new average for each position of the window. The term "moving" highlights that the set of data points included in the average changes as the window shifts.
The sequence of moving averages obtained is considered an estimate of the **Trend-Cycle component** of the time series, which could be additive ($T_t + C_t$) or multiplicative ($T_t \times C_t$).
Procedure
The calculation of moving averages involves the following steps:
- **Choose the Period ($k$):** Select the length of the moving average window, denoted by $k$. The choice of $k$ is critical. To effectively smooth out seasonality, the period $k$ should ideally be equal to the length of the seasonal cycle (e.g., $k=12$ for monthly data with yearly seasonality, $k=4$ for quarterly data with yearly seasonality, $k=7$ for daily data with weekly seasonality). Using a period that corresponds to the cycle length ensures that each moving average includes exactly one full set of seasonal observations, thus averaging out the seasonal effect.
- **Calculate Moving Totals:** Sum the values of the first $k$ observations ($Y_1, Y_2, \dots, Y_k$). This is the first moving total. Then, shift the window one period forward, drop the first value ($Y_1$), and add the $(k+1)$-th value ($Y_{k+1}$) to the sum of the previous $k-1$ values ($Y_2, \dots, Y_k$) to get the second moving total ($\sum_{i=2}^{k+1} Y_i$). Continue this process throughout the series. The moving total for a window starting at time $t$ is $\sum_{i=t}^{t+k-1} Y_i$.
- **Calculate Moving Averages:** Divide each moving total by the period $k$. The moving average for a window starting at time $t$ is $\frac{1}{k} \sum_{i=t}^{t+k-1} Y_i$.
- **Center the Average (Crucial for Even Periods):** The moving average calculated in the previous step is located at the time point that represents the center of the $k$ periods included in the calculation.
- If the period $k$ is **odd** (e.g., 3-year, 5-year moving average), the center of the window falls precisely on a data point. For a 3-period moving average of $Y_1, Y_2, Y_3$, the average is centered at period 2. For a 5-period average of $Y_1, \dots, Y_5$, it's centered at period 3. In general, for odd $k$, the moving average of $Y_t, \dots, Y_{t+k-1}$ is centered at time $t + (k-1)/2$.
- If the period $k$ is **even** (e.g., 4-quarterly, 12-monthly moving average), the center of the window falls exactly *between* two consecutive time points. For a 4-period average of $Y_1, Y_2, Y_3, Y_4$, the average is centered between period 2 and period 3 (at time 2.5). An average value between two time points is not directly aligned with the original data points. To align the moving average with the original time points, a further step is required: calculate a **2-period moving average of the initial moving averages**. This process is called "centering the moving average". The centered moving average for periods $t$ and $t+1$ (which are themselves moving averages centered between original time points) will be centered at time $t + 0.5$, aligning with the original time point $t+1$. For example, the 4-quarterly moving average initially calculated falls between quarters. Taking a 2-period moving average of these averages centers the value at the quarter end. This is often referred to as a $k \times 2$ moving average (e.g., $4 \times 2$ moving average for quarterly data).
The resulting series of (centered) moving averages provides a smoothed representation of the original time series, which is taken as the estimate of the Trend-Cycle component.
Example 1. Calculate the 3-yearly moving averages for the following data:
Year | Value ($Y_t$) |
---|---|
2017 | 10 |
2018 | 12 |
2019 | 11 |
2020 | 14 |
2021 | 15 |
2022 | 13 |
2023 | 16 |
Answer:
Given: Yearly time series data.
To Find: 3-yearly moving averages.
Solution:
The period of the moving average is $k=3$, which is odd. The 3-year moving average will be centered at the middle year of the 3-year window.
Year (t) | Value ($Y_t$) | 3-Year Moving Total (Centered between years) | 3-Year Moving Average (Centered at middle year) |
---|---|---|---|
2017 | 10 | - | - |
2018 | 12 | $10+12+11 = 33$ | $33 / 3 = 11.00$ |
2019 | 11 | $12+11+14 = 37$ | $37 / 3 \approx 12.33$ |
2020 | 14 | $11+14+15 = 40$ | $40 / 3 \approx 13.33$ |
2021 | 15 | $14+15+13 = 42$ | $42 / 3 = 14.00$ |
2022 | 13 | $15+13+16 = 44$ | $44 / 3 \approx 14.67$ |
2023 | 16 | - | - |
The 3-yearly moving averages are calculated for the years 2018 through 2022. These values represent the estimated trend component for those respective years, having smoothed out shorter-term fluctuations.
Note that the moving average is calculated for $n-k+1 = 7-3+1 = 5$ periods (2018 to 2022). Data points are lost at the beginning and end.
Example 2. Calculate the 4-quarterly centered moving averages for the following quarterly data:
Year | Quarter | Value ($Y_t$) |
---|---|---|
2022 | Q1 | 30 |
Q2 | 40 | |
Q3 | 35 | |
Q4 | 50 | |
2023 | Q1 | 35 |
Q2 | 45 | |
Q3 | 40 | |
Q4 | 55 |
Answer:
Given: Quarterly time series data.
To Find: 4-quarterly centered moving averages.
Solution:
The period of the moving average is $k=4$, which is even. We need to calculate a $4$-period moving average and then center it using a $2$-period moving average of the $4$-period averages.
Year | Quarter | Value ($Y_t$) | 4-Quarter Moving Total (Centered between Q) |
4-Quarter Moving Average (Centered between Q) |
2-Period Moving Total of 4-Q M.A. |
4-Quarter Centered M.A. (Centered at Quarter) |
---|---|---|---|---|---|---|
(Centered at Quarter) | (Centered at Quarter) | |||||
2022 | Q1 | 30 | - | - | - | - |
Q2 | 40 | - | - | - | - | |
Q3 | 35 | $30+40+35+50 = 155$ | $155/4=38.75$ | - | - | |
Q4 | 50 | $40+35+50+35 = 160$ | $160/4=40.00$ | $38.75+40.00=78.75$ | $78.75 / 2 = 39.375$ | |
2023 | Q1 | 35 | $35+50+35+45 = 165$ | $165/4=41.25$ | $40.00+41.25=81.25$ | $81.25 / 2 = 40.625$ |
Q2 | 45 | $50+35+45+40 = 170$ | $170/4=42.50$ | $41.25+42.50=83.75$ | $83.75 / 2 = 41.875$ | |
Q3 | 40 | $35+45+40+55 = 175$ | $175/4=43.75$ | $42.50+43.75=86.25$ | $86.25 / 2 = 43.125$ | |
Q4 | 55 | - | - | - | - |
The 4-quarterly centered moving averages (shown in the last column) provide a smoothed estimate of the trend-cycle component, aligned with the quarters from 2022 Q4 to 2023 Q3.
Note that we lose $k/2 = 4/2 = 2$ data points at the beginning and $k/2=2$ data points at the end due to the calculation process, for a total loss of $k$ points. For a $4 \times 2$ moving average, we actually lose $k-1 = 3$ data points at each end, as the first centered average is for period $(k/2 + (k/2-1)) / 2$ for the original series? Let's check the time points carefully for even $k=4$.
Moving average for $Y_1, Y_2, Y_3, Y_4$ is centered at time 2.5.
Moving average for $Y_2, Y_3, Y_4, Y_5$ is centered at time 3.5.
The 2-period average of these two moving averages is centered at time $(2.5+3.5)/2 = 3$. This corresponds to the 3rd period (Q3 in this example). So the first centered moving average is for Q3 of 2022.
The window for the first moving average is Q1 2022 to Q4 2022. Its average is centered between Q2 and Q3. The window for the second moving average is Q2 2022 to Q1 2023. Its average is centered between Q3 and Q4. The average of these two (centering) is centered at Q3 2022. So the first centered MA is for Q3 2022.
The number of centered moving averages is $n-k+1 - 1 = n-k$. For $n=8, k=4$, this is $8-4=4$ averages. These are for Q4 2022, Q1 2023, Q2 2023, Q3 2023. My table shows calculation up to Q3 2023, which is 4 values. The calculation seems correct, the number of lost points at each end is $k/2 + (2-1)/2 = k/2 + 0.5 = (k+1)/2$ if $k$ is odd, and $k/2 + 1$ if $k$ is even? No, for $k=4$, the first average is for period 2.5. The 2-period average of MA starts at period 3. So periods 1 and 2 are lost at the start. $k/2 = 2$ periods lost at start. At the end, for $n=8$, the last 4-period MA is $Y_5, Y_6, Y_7, Y_8$ (Q1 2023 to Q4 2023), centered at 6.5. The previous one is $Y_4, Y_5, Y_6, Y_7$ (Q4 2022 to Q3 2023), centered at 5.5. The centered average of 5.5 and 6.5 is 6. This corresponds to Q2 2023. Ah, my table calculation for the time center is off. Let's re-center based on the time point of the *original* data.
A 3-year MA of Yrs 1, 2, 3 is centered at Yr 2.
A 4-Q MA of Q1, Q2, Q3, Q4 is centered between Q2 and Q3.
A 2-period MA of values centered at 2.5 and 3.5 is centered at 3.
So for $k=4$: MA of $Y_1,Y_2,Y_3,Y_4$ (center 2.5), MA of $Y_2,Y_3,Y_4,Y_5$ (center 3.5). Centered MA is avg of these, centered at 3. So the first centered MA is for period 3 of the original series.
Original periods: 2022 Q1 (1), Q2 (2), Q3 (3), Q4 (4), 2023 Q1 (5), Q2 (6), Q3 (7), Q4 (8).
First 4-Q MA window: 1,2,3,4. Avg = 38.75, centered at 2.5.
Second 4-Q MA window: 2,3,4,5. Avg = 40.00, centered at 3.5.
First Centered MA: Avg of 38.75 (at 2.5) and 40.00 (at 3.5). $(38.75+40)/2=39.375$. Centered at $(2.5+3.5)/2 = 3$. Period 3 is 2022 Q3. So the first centered MA is for 2022 Q3.
Next 4-Q MA window: 3,4,5,6. Avg = 41.25, centered at 4.5.
Second Centered MA: Avg of 40.00 (at 3.5) and 41.25 (at 4.5). $(40+41.25)/2=40.625$. Centered at $(3.5+4.5)/2 = 4$. Period 4 is 2022 Q4. So the second centered MA is for 2022 Q4.
Next 4-Q MA window: 4,5,6,7. Avg = 42.50, centered at 5.5.
Third Centered MA: Avg of 41.25 (at 4.5) and 42.50 (at 5.5). $(41.25+42.5)/2=41.875$. Centered at $(4.5+5.5)/2 = 5$. Period 5 is 2023 Q1. So the third centered MA is for 2023 Q1.
Next 4-Q MA window: 5,6,7,8. Avg = 43.75, centered at 6.5.
Fourth Centered MA: Avg of 42.50 (at 5.5) and 43.75 (at 6.5). $(42.5+43.75)/2=43.125$. Centered at $(5.5+6.5)/2 = 6$. Period 6 is 2023 Q2. So the fourth centered MA is for 2023 Q2.
My table's centering column labels and values were slightly off regarding which period it centers on. Let's fix the table labels and entries.
Year | Quarter | Value ($Y_t$) | 4-Quarter Moving Total (Centered between quarters) |
4-Quarter Moving Average (Centered between quarters) |
2-Period Moving Total of 4-Q M.A. |
4-Quarter Centered M.A. (Centered at Quarter) |
---|---|---|---|---|---|---|
(Centered at Quarter) | (Centered at Quarter) | |||||
2022 | Q1 | 30 | - | - | - | - |
Q2 | 40 | - | - | - | - | |
Q3 | 35 | $30+40+35+50 = 155$ | $155/4=38.75$ (between Q2 & Q3) | $38.75+40.00=78.75$ | $78.75 / 2 = 39.375$ (Centered at Q3) | |
Q4 | 50 | $40+35+50+35 = 160$ | $160/4=40.00$ (between Q3 & Q4) | $40.00+41.25=81.25$ | $81.25 / 2 = 40.625$ (Centered at Q4) | |
2023 | Q1 | 35 | $35+50+35+45 = 165$ | $165/4=41.25$ (between Q4 & Q1 2023) | $41.25+42.50=83.75$ | $83.75 / 2 = 41.875$ (Centered at Q1 2023) |
Q2 | 45 | $50+35+45+40 = 170$ | $170/4=42.50$ (between Q1 & Q2 2023) | $42.50+43.75=86.25$ | $86.25 / 2 = 43.125$ (Centered at Q2 2023) | |
Q3 | 40 | $35+45+40+55 = 175$ | $175/4=43.75$ (between Q2 & Q3 2023) | - | - | |
Q4 | 55 | - | - | - | - |
The centered moving averages for this data are approximately 39.38 (for 2022 Q3), 40.63 (for 2022 Q4), 41.88 (for 2023 Q1), and 43.13 (for 2023 Q2).
This method results in a loss of data points at the beginning and end equal to $(k-1)$ periods for odd $k$ (total $k-1$ points lost), and $(k/2) + (k/2-1) + (1) = k-1$ periods for even $k$ where 2-period centering is used (actually $k/2$ periods at start and $k/2-1$ at the end for the first MA, then 1 more at end for centering). Total lost is $k-1+1 = k$ for centered. Let's be precise: For odd $k$, $(k-1)/2$ points are lost at each end. Total $k-1$ points lost. For even $k$, $k/2$ points are lost by the initial moving average, and 1 more by the 2-period moving average of MAs at the end. Total lost is $k/2$ at the start and $k/2$ at the end. Total $k$ points lost.
In Example 1 ($k=3$), we lost $(3-1)/2 = 1$ point at the start (2017) and 1 at the end (2023). Total $1+1=2$ points lost, which is $k-1$. Correct.
In Example 2 ($k=4$), we lost $4/2 = 2$ points at the start (2022 Q1, Q2) and $4/2 = 2$ points at the end (2023 Q3, Q4). Total $2+2=4$ points lost, which is $k$. Correct.
Advantages
- **Simple and Understandable:** The concept of averaging over a window is easy to grasp. Calculations are also straightforward.
- **Effective Smoothing:** It is very effective at smoothing out short-term fluctuations, particularly seasonality when the period $k$ is chosen appropriately (equal to the seasonal length).
- **Flexibility:** It does not assume a specific mathematical form for the trend (like linearity), allowing it to follow the general path of non-linear or complex trends.
- **Useful for Decomposition:** Moving averages, especially centered moving averages, are a key step in many classical time series decomposition methods to estimate the Trend-Cycle component.
Disadvantages
- **Loss of Data:** Moving averages cannot be calculated for the initial and final periods of the series, resulting in a loss of data points at both ends. For a $k$-period moving average, $(k-1)/2$ data points are lost at each end if $k$ is odd (total $k-1$ lost). For a $k \times 2$ centered moving average (where $k$ is even), $k/2$ points are lost at each end (total $k$ lost).
- **No Mathematical Equation:** The method produces a smoothed series of values but does not yield a single mathematical equation for the trend line or curve. This makes direct extrapolation for forecasting challenging, although some methods use the last calculated moving average as a projection base.
- **Subjectivity in Period Selection:** The choice of the period $k$ is critical and can introduce some subjectivity, especially if the seasonal or cyclical length is not precisely known or varies. An inappropriate period can lead to over-smoothing, under-smoothing, or distortion of the underlying patterns.
- **Sensitivity to Extreme Values:** Although smoothing reduces the impact, extreme values within a moving window can still affect the corresponding moving average.
- **May Not Isolate Trend from Cycle:** The moving average typically estimates the combined Trend-Cycle component. Separating the trend from the cycle after smoothing can be difficult without further analysis or assumptions.
Despite its limitations, the Moving Average method is a powerful visual and preliminary analytical tool and a fundamental step in classical time series decomposition. It is valued for its ability to reveal underlying patterns obscured by short-term noise.
Summary for Competitive Exams - Moving Average Method
Moving Average Method: Smoothing technique to estimate Trend-Cycle ($T \times C$ or $T+C$).
Procedure:
- Choose period $k$ (ideally seasonal length).
- Calculate moving totals over windows of size $k$.
- Calculate moving averages by dividing totals by $k$.
- Centering (if $k$ is even): Calculate a 2-period moving average of the initial moving averages.
Advantages: Simple, effective smoothing (especially of seasonality), flexible (follows non-linear trends), useful for decomposition.
Disadvantages: **Loses data points** at ends (total $k-1$ for odd $k$, $k$ for even $k$ centered); no mathematical equation for trend (hinders extrapolation); choice of $k$ is crucial; sensitive to outliers.
Provides a smoothed series, not an explicit trend line/curve equation.
Method of Least Squares (Fitting a Straight Line Trend)
Concept
The **Method of Least Squares** is a widely used mathematical and statistical technique for finding the "best fitting" straight line or curve to a set of data points. In the context of time series analysis, it is employed to fit a mathematical function that represents the long-term trend component ($T_t$). The most basic application is fitting a straight line trend, assuming that the trend component can be approximated by a linear relationship with time.
A straight line can be represented by the equation:
$$\mathbf{Y_c = a + bX}$$
... (i)
Where:
- $Y_c$ represents the calculated or predicted trend value for a given time point $X$.
- $X$ is the independent variable, representing time (e.g., year, quarter number, coded time).
- $a$ is the y-intercept, representing the estimated trend value when $X=0$.
- $b$ is the slope of the line, representing the estimated average change in the trend value ($Y_c$) for each unit increase in time ($X$).
The Method of Least Squares determines the values of the coefficients ($a$ and $b$) such that the sum of the squares of the vertical distances (or errors, or residuals) between the actual observed values ($Y$) and the values predicted by the trend line ($Y_c$) is minimized. That is, we want to find $a$ and $b$ that minimize $\sum (Y - Y_c)^2$ over all data points. Substituting $Y_c = a + bX$, the objective is to minimize:
$$\sum (Y - (a + bX))^2$$
Normal Equations
Using principles of calculus (specifically, finding the minimum by taking partial derivatives with respect to $a$ and $b$ and setting them equal to zero), the values of $a$ and $b$ that minimize the sum of squared errors can be found by solving the following system of two linear equations, known as the **normal equations**:
$$\sum Y = na + b\sum X$$
... (II)
$$\sum XY = a\sum X + b\sum X^2$$
... (III)
Where:
- $Y$ represents the actual observed values of the time series.
- $X$ represents the corresponding time variable values (e.g., 1, 2, 3,... or coded time).
- $n$ is the total number of observations (data points) in the time series.
- $\sum Y$ is the sum of the actual $Y$ values.
- $\sum X$ is the sum of the time variable $X$ values.
- $\sum XY$ is the sum of the products of each $X$ value and its corresponding $Y$ value.
- $\sum X^2$ is the sum of the squares of the $X$ values.
- $a$ and $b$ are the coefficients we need to find.
To find $a$ and $b$, you would typically calculate the sums $\sum Y, \sum X, \sum XY, \sum X^2$ from your data and then solve the two simultaneous equations (II and III).
Simplification using Coded Time
Solving simultaneous equations can be tedious, especially manually. The calculations for $a$ and $b$ are significantly simplified if the time variable $X$ is coded in such a way that its sum is zero, i.e., $\sum X = 0$. This is achieved by setting the origin (where $X=0$) at the center of the time series.
- If the number of periods ($n$) is odd: Assign $X=0$ to the middle period. Assign $X = -1, -2, -3, \dots$ to the periods immediately preceding the middle period, and $X = 1, 2, 3, \dots$ to the periods immediately succeeding the middle period. Example: For years 2019, 2020, 2021, 2022, 2023 (n=5), code 2021 as $X=0$, 2020 as $X=-1$, 2019 as $X=-2$, 2022 as $X=1$, 2023 as $X=2$. The sum of $X$ values $(-2 + -1 + 0 + 1 + 2 = 0)$ will be zero.
- If the number of periods ($n$) is even: There is no single middle period. The center falls between the two middle periods. To make $\sum X = 0$, assign $X = \dots, -5, -3, -1$ to the periods before the center (using intervals of 2 units, representing half-periods) and $X = 1, 3, 5, \dots$ to the periods after the center. Example: For years 2018, 2019, 2020, 2021, 2022, 2023 (n=6), the center is between 2020 and 2021. Code 2020 as $X=-1$, 2019 as $X=-3$, 2018 as $X=-5$. Code 2021 as $X=1$, 2022 as $X=3$, 2023 as $X=5$. The sum of $X$ values $(-5 + -3 + -1 + 1 + 3 + 5 = 0)$ will be zero. This coding uses units of half-periods.
When the time variable $X$ is coded such that $\sum X = 0$, the normal equations simplify:
- From Equation (II): $\sum Y = na + b(0) \implies \sum Y = na \implies \mathbf{a = \frac{\sum Y}{n} = \bar{Y}}$ (The intercept $a$ is simply the mean of the $Y$ values).
- From Equation (III): $\sum XY = a(0) + b\sum X^2 \implies \sum XY = b\sum X^2 \implies \mathbf{b = \frac{\sum XY}{\sum X^2}}$ (The slope $b$ is the ratio of the sum of $XY$ products to the sum of squared $X$ values).
These simplified formulas make calculating $a$ and $b$ much easier, especially when using coded time originating at the series' center.
Example 1. Fit a straight line trend by the method of least squares to the following data and estimate the sales value for 2025.
Year | Sales ($\textsf{₹}$ Lakhs) ($Y$) |
---|---|
2019 | 70 |
2020 | 75 |
2021 | 80 |
2022 | 85 |
2023 | 90 |
Answer:
Given: Yearly sales data for 5 years.
To Find: Straight line trend equation using Least Squares and forecast for 2025.
Solution:
The number of years is $n=5$, which is odd. We will use coded time $X$ with the origin (where $X=0$) at the middle year, 2021. The equation of the trend line is $Y_c = a + bX$.
1. Set up the calculation table and code time (X):
Year | Sales ($\textsf{₹}$ Lakhs) ($Y$) | Time Code $X$ (Origin at 2021) | $XY$ | $X^2$ |
---|---|---|---|---|
2019 | 70 | -2 | $(-2) \times 70 = -140$ | $(-2)^2 = 4$ |
2020 | 75 | -1 | $(-1) \times 75 = -75$ | $(-1)^2 = 1$ |
2021 | 80 | 0 | $0 \times 80 = 0$ | $0^2 = 0$ |
2022 | 85 | 1 | $1 \times 85 = 85$ | $1^2 = 1$ |
2023 | 90 | 2 | $2 \times 90 = 180$ | $2^2 = 4$ |
Total ($\sum$) | $\sum Y=400$ | $\sum X=0$ | $\sum XY = -140 - 75 + 0 + 85 + 180 = 50$ | $\sum X^2 = 4 + 1 + 0 + 1 + 4 = 10$ |
Number of observations $n=5$. We observe that $\sum X = 0$, which simplifies the formulas for $a$ and $b$.
2. Calculate $a$ and $b$ using simplified formulas:
$$a = \frac{\sum Y}{n} = \frac{400}{5} = 80$$
[From simplified normal equation]
$$b = \frac{\sum XY}{\sum X^2} = \frac{50}{10} = 5$$
[From simplified normal equation]
3. Write the Trend Equation:
Substitute the calculated values of $a$ and $b$ into the trend equation $Y_c = a + bX$ (Equation i).
$$Y_c = 80 + 5X$$
(Origin: Year 2021; Unit of $X$: 1 Year)
This equation allows us to calculate the estimated trend value for any given year $X$ relative to the origin 2021.
4. Estimate Sales for 2025:
To estimate the sales for the year 2025, we need to find the corresponding value of the coded time variable $X$.
The origin $X=0$ is at year 2021. The year 2025 is $2025 - 2021 = 4$ years away from the origin in the positive direction.
So, for the year 2025, $X = 4$.
Substitute $X=4$ into the trend equation:
$$Y_c (2025) = 80 + 5 \times 4$$
(Substituting $X=4$ into trend equation)
$$Y_c (2025) = 80 + 20$$
$$Y_c (2025) = 100$$
The estimated sales for the year 2025, based on this linear trend model, are $\textsf{₹}100$ Lakhs.
Advantages
- **Objectivity:** Given a specific functional form for the trend (e.g., linear), the Method of Least Squares provides a unique and objective best-fitting curve according to the criterion of minimizing squared errors.
- **Mathematical Foundation:** It is based on a sound statistical principle and provides a precise mathematical equation for the trend ($Y_c = f(X)$).
- **Forecasting Capability:** The explicit mathematical equation allows for straightforward and objective extrapolation to estimate trend values for future periods.
- **Flexibility in Trend Shape:** While the example shows fitting a straight line, the method can be extended to fit various non-linear trend curves (like parabolas, exponential curves) by including higher powers of $X$ or transformations of $Y$ in the equation (e.g., $Y_c = a + bX + cX^2$ for a quadratic trend, $\log Y_c = a + bX$ for an exponential trend).
- **Basis for Advanced Models:** The principle of least squares forms the foundation for regression analysis and many advanced time series modeling techniques.
Disadvantages
- **Assumes Specific Form:** The method requires the analyst to assume a specific functional form (linear, quadratic, etc.) for the trend before fitting. If the chosen form does not match the true underlying trend, the fitted curve will be a poor representation. Visual inspection of the data plot is important for making an informed choice about the form.
- **More Complex Calculations:** While coding time simplifies calculations for basic forms, fitting non-linear trends or working without coded time can involve solving more complex systems of normal equations.
- **Sensitivity to Outliers:** Like methods based on means and sums of squares, the least squares method can be influenced by extreme values or outliers in the data, potentially pulling the fitted line/curve away from the general pattern of the majority of points.
- **Assumes Homoscedasticity (for standard inference):** While least squares fitting is possible regardless, standard statistical inference (like confidence intervals for coefficients) assumes constant variance of errors (homoscedasticity), which might not always hold in time series.
Despite some limitations, the Method of Least Squares is generally considered the most statistically robust method for trend estimation when an appropriate functional form can be assumed. It is widely used in practice due to its objectivity, mathematical rigor, and usefulness for forecasting.
Summary for Competitive Exams - Method of Least Squares (Linear Trend)
Method of Least Squares: Objective statistical method to fit the "best" trend line/curve by minimizing $\sum (Y - Y_c)^2$.
Linear Trend Equation: $$Y_c = a + bX$$ (where $X$ is time variable).
Normal Equations (to find $a$ and $b$):
- $$\sum Y = na + b\sum X$$
- $$\sum XY = a\sum X + b\sum X^2$$
Simplification using Coded Time ($\sum X = 0$):
- Odd $n$: $X=0$ at middle period.
- Even $n$: $X=-1, 1$ at two middle periods, then $-3, 3$, etc. (units of half-periods).
If $\sum X = 0$:
- $$a = \frac{\sum Y}{n}$$
- $$b = \frac{\sum XY}{\sum X^2}$$
Advantages: Objective, provides mathematical equation ($Y_c = a+bX$), allows forecasting, statistically sound, adaptable to non-linear forms.
Disadvantages: Assumes specific functional form (e.g., linear), calculations can be complex, sensitive to outliers.
Method of Least Squares (Fitting a Parabolic Trend)
Concept
While a straight line often serves as a reasonable approximation for the long-term trend, some time series exhibit a trend that is not linear. If the trend shows a consistent curvature (e.g., sales growth initially slow, then rapid, then slowing down again, forming an S-shape in the long run, or a product lifecycle peaking and then declining), a more complex function is needed to capture the curvature. A common mathematical form used to model such a non-linear trend, particularly one with a single bend or point of inflection, is a **parabola**, which is a second-degree polynomial.
The equation for a parabolic trend is:
$$\mathbf{Y_c = a + bX + cX^2}$$
... (i)
Where:
- $Y_c$ is the calculated trend value for a given time point.
- $X$ represents the time variable (again, often coded for computational simplicity).
- $a$, $b$, and $c$ are constants (coefficients) that determine the shape and position of the parabola.
- The coefficient $a$ represents the value of $Y_c$ when $X=0$.
- The coefficient $b$ relates to the slope of the curve.
- The coefficient $c$ determines the curvature of the parabola. If $c > 0$, the parabola opens upwards (U-shape). If $c < 0$, it opens downwards (inverted U-shape). If $c=0$, the equation reduces to a linear trend ($Y_c = a + bX$).
Similar to the linear case, the **Method of Least Squares** is used to find the values of $a$, $b$, and $c$ that make the fitted parabolic curve the "best fit" for the observed data points $Y$. This is achieved by minimizing the sum of the squares of the vertical differences (residuals) between the actual values ($Y$) and the trend values predicted by the parabolic equation ($Y_c$).
Minimize $\sum (Y - Y_c)^2 = \sum (Y - (a + bX + cX^2))^2$ over all data points.
Normal Equations
To find the values of $a$, $b$, and $c$ that minimize the sum of squared errors, we use calculus by taking partial derivatives of $\sum (Y - (a + bX + cX^2))^2$ with respect to $a$, $b$, and $c$ and setting these derivatives to zero. This process results in a system of three simultaneous linear equations, known as the **normal equations** for fitting a parabolic trend:
$$\sum Y = na + b\sum X + c\sum X^2$$
... (II)
$$\sum XY = a\sum X + b\sum X^2 + c\sum X^3$$
... (III)
$$\sum X^2Y = a\sum X^2 + b\sum X^3 + c\sum X^4$$
... (IV)
Where:
- $n$ is the total number of data points.
- $\sum Y, \sum X, \sum X^2, \sum X^3, \sum X^4, \sum XY, \sum X^2Y$ are sums calculated from the observed data (Y) and the chosen time variable (X).
- $a, b, c$ are the unknown coefficients to be determined by solving these three equations.
Simplification using Coded Time
Solving three simultaneous equations can be complex. However, just as in the linear case, the calculations are considerably simplified if the time variable $X$ is coded such that the origin ($X=0$) is set at the center of the time series. If $X$ is coded symmetrically around the middle period (using increments of 1 for odd $n$ or increments of 2 for even $n$, as described for linear trend), then:
- The sum of $X$ values will be zero: $\sum X = 0$.
- The sum of odd powers of $X$ will also be zero: $\sum X^3 = 0, \sum X^5 = 0, \dots$.
If the time coding ensures $\sum X = 0$ and $\sum X^3 = 0$, the normal equations (II, III, IV) simplify as follows:
Equation (II): $\sum Y = na + b(0) + c\sum X^2 \implies \mathbf{\sum Y = na + c\sum X^2}$ ... (V)
Equation (III): $\sum XY = a(0) + b\sum X^2 + c(0) \implies \mathbf{\sum XY = b\sum X^2} \implies \mathbf{b = \frac{\sum XY}{\sum X^2}}$ ... (VI)
Equation (IV): $\sum X^2Y = a\sum X^2 + b(0) + c\sum X^4 \implies \mathbf{\sum X^2Y = a\sum X^2 + c\sum X^4}$ ... (VII)
With this simplification:
- The coefficient $b$ can be calculated directly using Equation (VI).
- Equations (V) and (VII) form a system of two linear equations with two unknowns ($a$ and $c$), which can be solved simultaneously.
This coded time approach significantly streamlines the process of fitting a parabolic trend using least squares.
Example 1. Fit a parabolic trend ($Y_c = a + bX + cX^2$) by the method of least squares to the following data and estimate the value for 2026.
Year | Production (units) ($Y$) |
---|---|
2018 | 20 |
2019 | 25 |
2020 | 28 |
2021 | 30 |
2022 | 29 |
2023 | 27 |
2024 | 23 |
Answer:
Given: Yearly production data for 7 years.
To Find: Parabolic trend equation ($Y_c = a + bX + cX^2$) using Least Squares and forecast for 2026.
Solution:
The number of years is $n=7$, which is odd. We will use coded time $X$ with the origin (where $X=0$) at the middle year, 2021. The equation of the parabolic trend is $Y_c = a + bX + cX^2$.
1. Set up the calculation table and code time (X):
Year | Production ($Y$) | Time Code $X$ (Origin at 2021) | $X^2$ | $X^3$ | $X^4$ | $XY$ | $X^2Y$ |
---|---|---|---|---|---|---|---|
2018 | 20 | -3 | 9 | -27 | 81 | -60 | 180 |
2019 | 25 | -2 | 4 | -8 | 16 | -50 | 100 |
2020 | 28 | -1 | 1 | -1 | 1 | -28 | 28 |
2021 | 30 | 0 | 0 | 0 | 0 | 0 | 0 |
2022 | 29 | 1 | 1 | 1 | 1 | 29 | 29 |
2023 | 27 | 2 | 4 | 8 | 16 | 54 | 108 |
2024 | 23 | 3 | 9 | 27 | 81 | 69 | 207 |
Total ($\sum$) | $\sum Y=182$ | $\sum X=0$ | $\sum X^2=28$ | $\sum X^3=0$ | $\sum X^4=196$ | $\sum XY=14$ | $\sum X^2Y=652$ |
Number of observations $n=7$. We observe that $\sum X = 0$ and $\sum X^3 = 0$, which simplifies the formulas for $a, b, c$.
2. Calculate $a$, $b$, and $c$ using simplified normal equations:
From Equation (VI), $b = \frac{\sum XY}{\sum X^2}$:
$$b = \frac{14}{28} = 0.5$$
[From Eq. (VI)]
From Equations (V) and (VII), we have a system for $a$ and $c$:
$$\sum Y = na + c\sum X^2 \implies 182 = 7a + 28c$$
$$\sum X^2Y = a\sum X^2 + c\sum X^4 \implies 652 = 28a + 196c$$
We can solve this system. Divide the first equation by 7 and the second by 28 to simplify:
Equation (V'): $182/7 = a + (28/7)c \implies 26 = a + 4c$
Equation (VII'): $652/28 = a + (196/28)c \implies 23.2857 \approx a + 7c$
Now solve the system:
$(a + 7c) - (a + 4c) \approx 23.2857 - 26$
$3c \approx -2.7143$
$$c \approx \frac{-2.7143}{3} \approx -0.9048$$
Substitute $c \approx -0.9048$ into Equation (V'): $26 = a + 4(-0.9048)$
$26 = a - 3.6192$
$$a = 26 + 3.6192 \approx 29.6192$$
So, $a \approx 29.6192$, $b = 0.5$, and $c \approx -0.9048$.
3. Write the Trend Equation:
Substitute the calculated values of $a$, $b$, and $c$ into the parabolic trend equation $Y_c = a + bX + cX^2$ (Equation i).
$$Y_c \approx 29.6192 + 0.5X - 0.9048X^2$$
(Origin: Year 2021; Unit of $X$: 1 Year)
This equation represents the estimated parabolic trend.
4. Estimate Production for 2026:
To estimate the production for the year 2026, we need to find the corresponding value of the coded time variable $X$.
The origin $X=0$ is at year 2021. The year 2026 is $2026 - 2021 = 5$ years away from the origin in the positive direction.
So, for the year 2026, $X = 5$.
Substitute $X=5$ into the trend equation:
$$Y_c (2026) \approx 29.6192 + 0.5(5) - 0.9048(5^2)$$
(Substituting $X=5$)
$$Y_c (2026) \approx 29.6192 + 2.5 - 0.9048(25)$$
$$Y_c (2026) \approx 29.6192 + 2.5 - 22.62$$
$$Y_c (2026) \approx 32.1192 - 22.62$$
$$Y_c (2026) \approx 9.4992$$
The estimated production for the year 2026, based on this parabolic trend model, is approximately $9.50$ units.
Since $c \approx -0.9048$ is negative, the parabola opens downwards, consistent with the production values rising and then falling towards the end of the observed period (2021 peak).
Advantages
- **Objectivity:** Provides a unique best-fitting parabolic curve for the data based on the least squares criterion.
- **Captures Curvature:** Unlike the linear trend, it can effectively model trends that have a single bend or are accelerating/decelerating.
- **Mathematical Equation:** Yields an explicit mathematical equation ($Y_c = a + bX + cX^2$) for the trend, allowing for interpolation and extrapolation (forecasting).
- **Statistical Basis:** Rooted in statistical theory, providing a rigorous approach to curve fitting.
Disadvantages
- **Assumes Parabolic Form:** The method assumes that a parabola is an appropriate shape for the long-term trend. If the actual trend follows a different pattern (e.g., exponential, multiple bends, or linear with a sudden shift), the parabolic fit will be inaccurate. Visual inspection of the data is crucial before deciding to fit a parabola.
- **More Complex Calculations:** Calculations are more involved than fitting a straight line, requiring more sums and solving a system of three normal equations (even with coded time, it requires solving two simultaneous equations for $a$ and $c$).
- **Extrapolation Risk:** While providing a formula for forecasting, extrapolating a parabolic trend far beyond the observed data range can be risky. A parabola eventually changes direction drastically, which might not be economically logical for long-term forecasts.
- **Sensitivity to Outliers:** Similar to the linear case, extreme data points can unduly influence the shape and position of the fitted parabola.
Fitting a parabolic trend using least squares is suitable when the time series data visibly exhibits a non-linear curvature that a second-degree polynomial can approximate. It offers a more flexible trend model than a straight line but requires careful consideration of the data's pattern and the potential pitfalls of extrapolation.
Summary for Competitive Exams - Methods of Measuring Trend
Methods for Measuring Trend ($T_t$):
- Freehand Curve: Simple, visual, **subjective**. No equation. (Least rigorous)
- Semi-Averages: Objective linear trend. Simple, but **assumes linearity only** and sensitive to outliers.
- Moving Average: **Smoothes data**, estimates Trend-Cycle. Requires period $k$. **Loses end data**, no explicit equation. (Good for decomposition)
- Least Squares: **Objective**, fits mathematical curve by minimizing $\sum(Y-Y_c)^2$. Gives equation, allows forecasting.
- Linear Trend ($Y_c = a+bX$): Fits a straight line. Normal equations: $\sum Y = na + b\sum X$, $\sum XY = a\sum X + b\sum X^2$. Simplified if $\sum X=0$.
- Parabolic Trend ($Y_c = a+bX+cX^2$): Fits a curve with one bend. Normal equations: $\sum Y = na + b\sum X + c\sum X^2$, $\sum XY = a\sum X + b\sum X^2 + c\sum X^3$, $\sum X^2Y = a\sum X^2 + b\sum X^3 + c\sum X^4$. Simplified if $\sum X=0, \sum X^3=0$.
Time Coding ($X$): Symmetric coding around the center ($\sum X=0$, $\sum X^{odd}=0$) greatly simplifies Least Squares calculations for polynomial trends.
Least Squares is generally preferred for analytical rigor and forecasting when the trend form is appropriate.